Search CORE

251 research outputs found

Regret analysis for performance metrics in multi-label classification: the case of Hamming and subset zero-one loss

Author: B. Taskar
D. McAllester
D.J.C. MacKay
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
I.H. Witten
J.C. Platt
L. Breiman
M. Boutell
R. Caruana
R. Nelsen
W. Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Ghent University Academic Bibliography

Learning preferences for large scale multi-label problems

Author: CW Hsu
G Ou
G Tsoumakas
G Tsoumakas
J Allan
J Read
ML Zhang
S Vembu
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable. The main contribution of this work is the proposal of a novel general online preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

ZORA

Archivio istituzionale della ricerca - Università di Padova

Random forests with random projections of the output space for high dimensional multi-label classification

Author: D. Achlioptas
D. Kocev
E.J. Candes
F. Pedregosa
G. Madjarov
G. Tsoumakas
G. Tsoumakas
J. Read
J.L. Faulon
L. Breiman
P. Geurts
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège

Food Ingredients Recognition through Multi-label Learning

Author: G Tsoumakas
K Aizawa
L Bossard
O Russakovsky
W Shimoda
Publication venue
Publication date: 27/07/2017
Field of study

Automatically constructing a food diary that tracks the ingredients consumed can help people follow a healthy diet. We tackle the problem of food ingredients recognition as a multi-label learning problem. We propose a method for adapting a highly performing state of the art CNN in order to act as a multi-label predictor for learning recipes in terms of their list of ingredients. We prove that our model is able to, given a picture, predict its list of ingredients, even if the recipe corresponding to the picture has never been seen by the model. We make public two new datasets suitable for this purpose. Furthermore, we prove that a model trained with a high variability of recipes and ingredients is able to generalize better on new data, and visualize how it specializes each of its neurons to different ingredients.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Learning and Inference in Probabilistic Classifier Chains with Beam Search

Author: B. Schölkopf
G. King
G. Tsoumakas
G. Tsoumakas
J. Read
P. Hart
S. Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

A System for Multi-label Classification of Learning Objects

Author: A. Chiappe
A. Gil
D. Dagger
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
M. Boutell
M.L. Zhang
M.L. Zhang
P. Mika
R.E. Schapire
S. Diplaris
S. Ternier
Y. Yang
Publication venue: Springer Science + Business Media
Publication date: 01/01/2011
Field of study

The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of Learning Object (LO), which provides ubiquitous access to multiple and distributed educational resources in many repositories. This article presents a system that enables the recovery and classification of LO and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives. For this classification, it is used a special multi-label data mining designed for the LO ranking tasks. According to each position, the system is responsible for presenting the results to the end user. The learning process is supervised, using two major tasks in supervised learning from multi-label data: multi-label classification and label ranking

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Gestion del Repositorio Documental de la Universidad de Salamanca

On Aggregation in Ensembles of Multilabel Classifiers

Author: C Shi
C Shi
D Kocev
G Madjarov
G Tsoumakas
G Tsoumakas
J Read
J Read
JM Moyano
JR Quinlan
K Dembczyński
L Breiman
ML Zhang
N Li
SK Murthy
TG Dietterich
W Waegeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/06/2020
Field of study

While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), where the ensemble members first make loss minimizing predictions which are subsequently combined, and "combine then predict" (CTP), which first aggregates information such as marginal label probabilities from the individual ensemble members, and then derives a prediction from this aggregation. While both approaches generalize voting techniques commonly used for multilabel ensembles, they allow to explicitly take the target performance measure into account. Therefore, concrete instantiations of CTP and PTC can be tailored to concrete loss functions. Experimentally, we show that standard voting techniques are indeed outperformed by suitable instantiations of CTP and PTC, and provide some evidence that CTP performs well for decomposable loss functions, whereas PTC is the better choice for non-decomposable losses.Comment: 14 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Why do Sequence Signatures Predict Enzyme Mechanism?:Homology versus Chemistry

Author: IUBMB.
Nikolskaya A.N.
Tsoumakas G.
Witten I.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2015
Field of study

We identify, firstly, InterPro sequence signatures representing evolutionary relatedness and, secondly, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme catalysed reactions from “catalytic” and “non-catalytic” subsets of InterPro signatures. We first scanned our 249 sequences with InterProScan and then used the MACiE database to identify those amino acid residues which are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine, and then again scanned with InterProScan. Those signature matches from the original scan which disappeared on mutation were called “catalytic”. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The noncatalytic signatures gave results indistinguishable from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.Publisher PDFPeer reviewe

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

University of St. Andrews - Pure

St Andrews Research Repository

Binary relevance efficacy for multilabel classification

Author: C Bielza
G Madjarov
G Tsoumakas
G Tsoumakas
J Read
JR Quevedo
ML Zhang
R Schapire
W Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Learning Interpretable Rules for Multi-label Classification

Author: A Gabriel
AA Freitas
AJ Knobbe
B Liu
B Minnaert
D Malerba
E Gibaja
E Gibaja
E Loza Mencía
E Montañés
F Charte
F Herrera
F Janssen
F Thabtah
G Bosc
G Tsoumakas
Grigorios Tsoumakas
H Allahyari
J Arunadevi
J Demšar
J Fürnkranz
J Han
J Hipp
J Read
JN Sulzmann
K Dembczyński
K Dembczyński
L Chekina
L Raedt De
LE Sucar
M Atzmüller
M Beckerle
M Friedman
M Zhang
Miltiadis Allamanis
MR Boutell
P Kralj Novak
PJ Hayes
R Senge
RM Cameron-Jones
Shantanu Godbole
W Duivesteijn
W Waegeman
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

arXiv.org e-Print Archive

TUbiblio

Crossref